A product recall is referred to as a request to return, exchange, or replace a product if a manufacturer or consumer protection organisation finds flaws that might impair performance, endanger customers, or result in legal problems for the manufacturers. Products that are dangerous or have manufacturing flaws are frequently put on store shelves and sold to customers. A product recall is a process used to remove unsafe or flawed products from customer hands. These recalls may occasionally result in claims of product responsibility.
We created a clear Analytical Problem of our project, which is ‘To identify and analyze the reasons for product recalls in the food, medical devices, and other consumer goods industries and evaluate the effectiveness of the recall processes in ensuring public safety. The goal is to identify patterns and trends in recall data to inform regulatory and industry efforts to prevent future recalls and improve the overall safety of consumer products.’ The next step of our project work which is identify dataset connected to our business problem and check the Quality of data, it’s suitability and usefulness to find the perfect solution. We have had access to many such dataset of past many years, but we want the recent data to predict the future recall, so we examine our data and select the dataset and followed our data assessment.
This dataset is for recall-system prediction from the U.S. Food and drug administration website. This is the latest data which has an information about different product typesuch as Biologics, Devices, Drugs, Food/Cosmetics, Tobacco, Veterinary and their status with the reason of why they are recalled from the specific state. We reviewed many dataset on website likes:
where there was identified that food and drug related dataset is hard to find as there are not any importanat co-rrelation between the variables. Above two links has not enough data about recall and it also does not have any numerical value. Above link has the data of recall system for canada and U.S. We decided to analyze with U.S. food and drug recall data as there are total 17 features in the dataset but in open canada website, they only have a limited amount of data as well as they do not have different product type. Apart from that in open canada website, they only have a past dataset but in FDA, we have all current data as well as different product type so analysis can be made easily. This many problems and old years dataset from which we cant figured out the patterns for our near future trends in recall.
We have total 81697 attributes and 17 features in our dataset. It does not have any missing values. The discription of the feature sare listed below:
1) FEI number: - The FEI number is a unique identifier assigned by the FDA to identify firms associated with FDA regulated products.
2) Recalling Firm Name: - it is a firm name which recall the drug from the market.
3) Product type: - type of the product which recall from the market.
4) Product classification: - it is classifying the product in three different class. Class I: A medical device with low to moderate risk that requires general controls. Class II: A medical device with a moderate to high risk that requires special controls. Class III: A medical device with high risk that requires premarket approval.
5) Status: - what is the status of the recall’s drugs.
6) Distribution Pattern: - where they distributed the drugs.
7) Recalling Firm City, Recalling Firm state, Recalling Firm Country: - from where they recall the drugs.
8) Center Classification Date: - the date of the recall.
9) Reason for Recall: - reason for the recall why the drugs are recalled.
10) Product Description: - it is the description of the product which indicates the transfusion of the product.
11) Event id: - Event identifiers uniquely identify a particular event. Each event source can define its own numbered events.
12) Product id: - it is identification number of the product.
13) Center: - it is the Center within FDA that regulates biological products for human use under applicable federal laws, including the Public Health Service Act and the Federal Food, Drug and Cosmetic Act.
14) Recall details: - in this column give the one URL and this URL we can see the full details of the recall products.
Data itself preety clear as it does not have any missing values and we can easily see from the heat map that thare are not too many co-relation between the attributes.
The FDA is responsible for caring the community health by ensuring the safety, efficacy, and safety of human and veterinary drugs, biological products, and medical devices. The dataset provides information on product recalls initiated by companies and manufacturers that are regulated by the FDA. The dataset includes information on a variety of products, including drugs, biological products, medical devices, and food. So the data is much more sufficient to compare with different product with number of recall times. We can easily answer the research question from the dataset.
Although all of the datasets used for this analytic project work were obtained from open data sources, we took the time to review the various terms and conditions of use for all of the data sources to ensure that all of the ethical guidelines, which include Consent, Clarity, Consistency, Control, and Consequences, are followed to the letter.
Consent: The FDA recalls dataset includes information on products and companies that are regulated by the agency. This information is sensitive and confidential, so it's more important to consider the privacy and confidentiality of the individuals and companies’ involvement in the recall process.
Clarity: The dataset contains well-structured and well-organized data, as well as detailed explanations of the recall data provided to analyze and predict the future trend by analyzing the data.
Consistency: The data is consistent in terms of the information that is included for each recall. We’ll ensure that users can easily compare different recalls, regardless of the type of product or the company responsible for the recall.
Control: Responsible for maintaining the accuracy and completeness of the data. The FDA may also take steps to protect the privacy and security of the data, such as redacting sensitive information or limiting access to the data to authorized users.
Consequences: The data is used to protect public health and to provide important information to consumers, health care professionals, and others. However, it's also important to consider the potential consequences of using the data, such as the risk of harm to consumers or the potential for misuse of the information.
# import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt
import warnings
warnings.filterwarnings("ignore")
data=pd.read_csv("recall.csv")
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 81697 entries, 0 to 81696 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 FEI Number 81697 non-null object 1 Recalling Firm Name 81697 non-null object 2 Product Type 81697 non-null object 3 Product Classification 81697 non-null object 4 Status 81697 non-null object 5 Distribution Pattern 81696 non-null object 6 Recalling Firm City 81697 non-null object 7 Recalling Firm State 81697 non-null object 8 Recalling Firm Country 81697 non-null object 9 Center Classification Date 81697 non-null object 10 Reason for Recall 81697 non-null object 11 Product Description 81697 non-null object 12 Event ID 81697 non-null int64 13 Event Classification 81697 non-null object 14 Product ID 81697 non-null int64 15 Center 81697 non-null object 16 Recall Details 81697 non-null object dtypes: int64(2), object(15) memory usage: 10.6+ MB
data.shape
(81697, 17)
data.describe()
| Event ID | Product ID | |
|---|---|---|
| count | 81697.000000 | 81697.000000 |
| mean | 75764.638163 | 151511.722793 |
| std | 9625.531055 | 28331.386121 |
| min | 32594.000000 | 40403.000000 |
| 25% | 68978.000000 | 129576.000000 |
| 50% | 76183.000000 | 152645.000000 |
| 75% | 83314.000000 | 175049.000000 |
| max | 91514.000000 | 198113.000000 |
data.head()
| FEI Number | Recalling Firm Name | Product Type | Product Classification | Status | Distribution Pattern | Recalling Firm City | Recalling Firm State | Recalling Firm Country | Center Classification Date | Reason for Recall | Product Description | Event ID | Event Classification | Product ID | Center | Recall Details | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3008744215 | LifeSource | Biologics | Class II | Terminated | Illinois | Rosemont | Illinois | United States | 01/20/2023 | Blood products, where transfusion-transmitted ... | PF24 Plasma | 91509 | Class II | 198084 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 1 | 3008744215 | LifeSource | Biologics | Class II | Terminated | Illinois | Rosemont | Illinois | United States | 01/20/2023 | Blood products, where transfusion-transmitted ... | Red Blood Cells, Leukocytes Reduced | 91509 | Class II | 198085 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 2 | 1000517886 | Versiti Michigan Inc | Biologics | Class II | Terminated | Michigan | Traverse City | Michigan | United States | 01/20/2023 | Blood products, collected in a manner with a p... | Red Blood Cells, Leukocytes Reduced, Irradiated | 91510 | Class II | 198086 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 3 | 1000517886 | Versiti Michigan Inc | Biologics | Class II | Terminated | Michigan | Traverse City | Michigan | United States | 01/20/2023 | Blood products, collected in a manner with a p... | PF24 Plasma | 91510 | Class II | 198087 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 4 | 1000517886 | Versiti Michigan Inc | Biologics | Class II | Terminated | Michigan | Traverse City | Michigan | United States | 01/20/2023 | Blood products, collected in a manner with a p... | Red Blood Cells, Leukocytes Reduced | 91510 | Class II | 198088 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
print ("Rows : " ,data.shape[0])
print ("Columns : " ,data.shape[1])
print ("\nFeatures : \n" ,data.columns.tolist())
print ("\nUnique values : \n",data.nunique())
Rows : 81697 Columns : 17 Features : ['FEI Number', 'Recalling Firm Name', 'Product Type', 'Product Classification', 'Status', 'Distribution Pattern', 'Recalling Firm City', 'Recalling Firm State', 'Recalling Firm Country', 'Center Classification Date', 'Reason for Recall', 'Product Description', 'Event ID', 'Event Classification', 'Product ID', 'Center', 'Recall Details'] Unique values : FEI Number 9814 Recalling Firm Name 8717 Product Type 6 Product Classification 3 Status 3 Distribution Pattern 17641 Recalling Firm City 2998 Recalling Firm State 54 Recalling Firm Country 49 Center Classification Date 2920 Reason for Recall 26301 Product Description 70862 Event ID 29689 Event Classification 3 Product ID 81697 Center 6 Recall Details 81697 dtype: int64
data.isnull().sum()
FEI Number 0 Recalling Firm Name 0 Product Type 0 Product Classification 0 Status 0 Distribution Pattern 1 Recalling Firm City 0 Recalling Firm State 0 Recalling Firm Country 0 Center Classification Date 0 Reason for Recall 0 Product Description 0 Event ID 0 Event Classification 0 Product ID 0 Center 0 Recall Details 0 dtype: int64
data = data.replace(" ", np.NaN)
data.isnull().sum()
FEI Number 0 Recalling Firm Name 0 Product Type 0 Product Classification 0 Status 0 Distribution Pattern 1 Recalling Firm City 0 Recalling Firm State 0 Recalling Firm Country 0 Center Classification Date 0 Reason for Recall 0 Product Description 0 Event ID 0 Event Classification 0 Product ID 0 Center 0 Recall Details 0 dtype: int64
data= data[pd.notnull(data['Distribution Pattern'])]
data
| FEI Number | Recalling Firm Name | Product Type | Product Classification | Status | Distribution Pattern | Recalling Firm City | Recalling Firm State | Recalling Firm Country | Center Classification Date | Reason for Recall | Product Description | Event ID | Event Classification | Product ID | Center | Recall Details | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3008744215 | LifeSource | Biologics | Class II | Terminated | Illinois | Rosemont | Illinois | United States | 01/20/2023 | Blood products, where transfusion-transmitted ... | PF24 Plasma | 91509 | Class II | 198084 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 1 | 3008744215 | LifeSource | Biologics | Class II | Terminated | Illinois | Rosemont | Illinois | United States | 01/20/2023 | Blood products, where transfusion-transmitted ... | Red Blood Cells, Leukocytes Reduced | 91509 | Class II | 198085 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 2 | 1000517886 | Versiti Michigan Inc | Biologics | Class II | Terminated | Michigan | Traverse City | Michigan | United States | 01/20/2023 | Blood products, collected in a manner with a p... | Red Blood Cells, Leukocytes Reduced, Irradiated | 91510 | Class II | 198086 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 3 | 1000517886 | Versiti Michigan Inc | Biologics | Class II | Terminated | Michigan | Traverse City | Michigan | United States | 01/20/2023 | Blood products, collected in a manner with a p... | PF24 Plasma | 91510 | Class II | 198087 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 4 | 1000517886 | Versiti Michigan Inc | Biologics | Class II | Terminated | Michigan | Traverse City | Michigan | United States | 01/20/2023 | Blood products, collected in a manner with a p... | Red Blood Cells, Leukocytes Reduced | 91510 | Class II | 198088 | CBER | https://www.accessdata.fda.gov/scripts/ires/?P... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 81692 | 3004404050 | Panera Bread LLC | Food/Cosmetics | Class II | Terminated | Nationwide | Saint Louis | Missouri | United States | 06/08/2012 | Product ingredient statement reversed for Red... | Panera ,HAZELNUT CREAM CHEESE SPREAD Reduced F... | 61831 | Class II | 109200 | CFSAN | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 81693 | 3004162088 | DSM Nutritional Products, Inc. | Food/Cosmetics | Class II | Terminated | NJ, WI, IL | Parsippany | New Jersey | United States | 06/08/2012 | Flavor is contaminated with Salmonella | GB Select Roast Meat Type Flavor Net Wt. 55 lb... | 61936 | Class II | 109523 | CFSAN | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 81694 | 3002727068 | Best West Foods | Food/Cosmetics | Class II | Terminated | NV only. | Las Vegas | Nevada | United States | 06/08/2012 | Soy was not included in the ingredient stateme... | Florentine Lasagna Rolls;\r\nPerishable, keep ... | 61968 | Class II | 109609 | CFSAN | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 81695 | 3002727068 | Best West Foods | Food/Cosmetics | Class II | Terminated | NV only. | Las Vegas | Nevada | United States | 06/08/2012 | Soy was not included in the ingredient stateme... | Cheese Lasagna Rolls;\r\nPerishable, keep froz... | 61968 | Class II | 109610 | CFSAN | https://www.accessdata.fda.gov/scripts/ires/?P... |
| 81696 | 3005339147 | Diversified Natural Products, Inc. | Food/Cosmetics | Class II | Terminated | WA,MI,NC,MN,FL,NH,AR,PA,MA,GA,TX,IN, \r\nWinni... | Scottville | Michigan | United States | 06/08/2012 | undeclared milk present in butter flavoring | AlsoSalt Sodium Free Butter flavored Salt Subs... | 61924 | Class II | 109494 | CFSAN | https://www.accessdata.fda.gov/scripts/ires/?P... |
81696 rows × 17 columns
data.isnull().sum()
FEI Number 0 Recalling Firm Name 0 Product Type 0 Product Classification 0 Status 0 Distribution Pattern 0 Recalling Firm City 0 Recalling Firm State 0 Recalling Firm Country 0 Center Classification Date 0 Reason for Recall 0 Product Description 0 Event ID 0 Event Classification 0 Product ID 0 Center 0 Recall Details 0 dtype: int64
import plotly.express as px
fig = px.pie(data,names='Product Type',
title='<b>Counts in Product Type</b>',
hole = 0.4,template='plotly_dark',
width=600,height=400)
fig.show()
data['Product Type'].value_counts()
Devices 29291 Food/Cosmetics 23696 Drugs 14780 Biologics 10850 Veterinary 3071 Tobacco 8 Name: Product Type, dtype: int64
fig = px.histogram(data,"Product Classification",
color="Product Classification", title="<b>Count in Product Classification</b>",
template='plotly_dark',
width=500,height=300)
fig.show()
fig = px.histogram(data,"Status",
color="Status", title="<b>Status</b>",
template='plotly_dark',
width=500,height=300)
fig.show()
fig = px.histogram(data,"Event Classification",
color="Event Classification", title="<b>Event Classification</b>",
template='plotly_dark',
width=500,height=300)
fig.show()
fig = px.histogram(data,"Center",
color="Center", title="<b>Center</b>",
template='plotly_dark',
width=500,height=300)
fig.show()
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
data["Recalling Firm Name"]=le.fit_transform(data["Recalling Firm Name"])
data["Product Type"]=le.fit_transform(data["Product Type"])
data["Product Classification"]=le.fit_transform(data["Product Classification"])
data["Status"]=le.fit_transform(data["Status"])
data["Distribution Pattern"]=le.fit_transform(data["Distribution Pattern"])
data["Product Description"]=le.fit_transform(data["Product Description"])
data["Event Classification"]=le.fit_transform(data["Event Classification"])
data["Center"]=le.fit_transform(data["Center"])
cmap=sns.diverging_palette(150,75, s=40, l=65, n=9)
corrmat = data.corr()
plt.subplots(figsize=(10,8))
sns.heatmap(corrmat,cmap=cmap,annot=True, square=True);
https://asq.org/quality-resources/recalls
https://www.canada.ca/en/services/health/food-recalls-alerts.html
https://www.eatthis.com/major-food-recalls-february-2023/
https://www.fda.gov/safety/recalls-market-withdrawals-safety-alerts
https://hayandknight.com/blog/2020/07/why-are-product-recalls-important/